Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

How to Get a List of all NASDAQ Securities as a CSV file using Python?

This post will be short but very informative. You can learn a few good Unix/Linux tricks on the way. The goal is well defined in the title. So, what’s the quickest solution? We will make use of Python in the Unix-based environment. As you will see, for any text file, writing a single line of Unix commands is more than enough to deliver exactly what we need (a basic text file processing). If you try to do the same in Windows.. well, good luck!

In general, we need to get through the FTP gate of NASDAQ heaven. It is sufficient to log on as an anonymous user providing your password defined by your email. In fact, any fake email will do the job. Let’s begin coding in Python:

1
2
3
4
5
6
7
8
9
10
# How to Get a List of all NASDAQ Securities as a CSV file using Python?
# +tested in Python 3.5.0b2, Mac OS X 10.10.3
#
# (c) 2015 QuantAtRisk.com, by Pawel Lachowicz
 
import os
 
os.system("curl --ftp-ssl anonymous:jupi@jupi.com "
          "ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt "
          "> nasdaq.lst")

Here we use os module from the Python’s Standard Library and a Unix command of curl. The latter allows us to connect to FTS server of NASDAQ exchange, fetch the file of nasdaqlisted.txt to be usually stored in the SymbolDirectory directory and download it directly to our current folder under a given name of nasdaq.lst. During that process you will see the progress information displayed by Python, e.g.:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0    125      0  0:00:01  0:00:01 --:--:--   125
100  174k  100  174k    0     0  23409      0  0:00:07  0:00:07 --:--:-- 39237

Now, in order to inspect the content of the downloaded file we may run in Python an extra line of code, namely:

12
13
os.system("head -20 nasdaq.lst")
print()

which displays first 20 lines from the top:

<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>
Symbol|Security Name|Market Category|Test Issue|Financial Status|Round Lot Size
AAIT|iShares MSCI All Country Asia Information Technology Index Fund|G|N|N|100
AAL|American Airlines Group, Inc. - Common Stock|Q|N|N|100
AAME|Atlantic American Corporation - Common Stock|G|N|D|100
AAOI|Applied Optoelectronics, Inc. - Common Stock|G|N|N|100
AAON|AAON, Inc. - Common Stock|Q|N|N|100
AAPC|Atlantic Alliance Partnership Corp. - Ordinary Shares|S|N|N|100
AAPL|Apple Inc. - Common Stock|Q|N|N|100
AAVL|Avalanche Biotechnologies, Inc. - Common Stock|G|N|N|100
AAWW|Atlas Air Worldwide Holdings - Common Stock|Q|N|N|100
AAXJ|iShares MSCI All Country Asia ex Japan Index Fund|G|N|N|100
ABAC|Aoxin Tianli Group, Inc. - Common Shares|S|N|N|100
ABAX|ABAXIS, Inc. - Common Stock|Q|N|N|100

As you can see, we are not interested in first 8 lines of our file. Before cleaning that mess, let’s inspect the “happing ending” as well:

15
16
os.system("tail -5 nasdaq.lst")
print()

displaying

ZVZZT|NASDAQ TEST STOCK|G|Y|N|100
ZWZZT|NASDAQ TEST STOCK|S|Y|N|100
ZXYZ.A|Nasdaq Symbology Test Common Stock|Q|Y|N|100
ZXZZT|NASDAQ TEST STOCK|G|Y|N|100
File Creation Time: 0624201511:02|||||

Again, we notice that the last line does not make our housewarming party more merrier.

Given that information, we employ heavy but smart one-liner making use of immortal Unix commands of cat and sed in the pipe (pipeline process). Therefore, the next calling in our Python code does 3 miracles all-in-one shot. Have a look:

18
19
os.system("tail -n +9 nasdaq.lst | cat | sed '$d' | sed 's/|/ /g' > "
          "nasdaq.lst2")

If you view the output file of nasdaq.lst2 you will see its content to be exactly as we wanted it to be, i.e.:

$ echo; head nasdaq.lst2; echo "..."; tail nasdaq.lst2
 
AAIT iShares MSCI All Country Asia Information Technology Index Fund G N N 100
AAL American Airlines Group, Inc. - Common Stock Q N N 100
AAME Atlantic American Corporation - Common Stock G N D 100
AAOI Applied Optoelectronics, Inc. - Common Stock G N N 100
AAON AAON, Inc. - Common Stock Q N N 100
AAPC Atlantic Alliance Partnership Corp. - Ordinary Shares S N N 100
AAPL Apple Inc. - Common Stock Q N N 100
AAVL Avalanche Biotechnologies, Inc. - Common Stock G N N 100
AAWW Atlas Air Worldwide Holdings - Common Stock Q N N 100
AAXJ iShares MSCI All Country Asia ex Japan Index Fund G N N 100
...
ZNGA Zynga Inc. - Class A Common Stock Q N N 100
ZNWAA Zion Oil & Gas Inc - Warrants G N N 100
ZSAN Zosano Pharma Corporation - Common Stock S N N 100
ZSPH ZS Pharma, Inc. - Common Stock G N N 100
ZU zulily, inc. - Class A Common Stock Q N N 100
ZUMZ Zumiez Inc. - Common Stock Q N N 100
ZVZZT NASDAQ TEST STOCK G Y N 100
ZWZZT NASDAQ TEST STOCK S Y N 100
ZXYZ.A Nasdaq Symbology Test Common Stock Q Y N 100
ZXZZT NASDAQ TEST STOCK G Y N 100

The command of

tail -n +9 nasdaq.lst

lists all lines of the file skipping first nine at the beginning. Next we push in a pipe that output and list it as a whole using cat command. In next step that output is processed by sed command which (a) removes the last line first; (b) the second one replaces all “|” tokens with “empty space” token. Finally, the processed output is saved as a nasdaq.lst2 file. The power of Unix in a single line. After 15 years of using it I’m still smiling to myself doing that :)

All right. What is left? Getting a list of tickers and storing it into a CSV file. Piece of cake. Here we employ the Unix command of awk in the following way:

21
22
os.system("awk '{print $1}' nasdaq.lst2 > nasdaq.csv")
os.system("echo; head nasdaq.csv; echo '...'; tail nasdaq.csv")

which returns

AAIT
AAL
AAME
AAOI
AAON
AAPC
AAPL
AAVL
AAWW
AAXJ
...
ZNGA
ZNWAA
ZSAN
ZSPH
ZU
ZUMZ
ZVZZT
ZWZZT
ZXYZ.A
ZXZZT

i.e. an isolated list of NASDAQ tickers stored in nasdaq.csv file. From this point, you can read it into Python’s pandas DataFrame as follows:

24
25
26
27
import pandas as pd
data = pd.read_csv("nasdaq.csv", index_col=None, header=None)
data.columns=["Ticker"]
print(data)

displaying

      Ticker
0       AAIT
1        AAL
2       AAME
3       AAOI
4       AAON
5       AAPC
...
 
[3034 rows x 1 columns]

That’s it.

In the following post, I will make use of that list to fetch the stock trading data and analyse the distribution of extreme values–the gateway to prediction of extreme and heavy losses for every portfolio holder (part 2 out of 3). Stay tuned!

DOWNLOADS
   nasdaqtickers.py

RELATED POSTS
   How to Find a Company Name given a Stock Ticker Symbol utilising Quandl API
   Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders (1)

  • Idriss Jebir

    Hi, thank you for this post but I was wondering what the difference is between two files available in the ftp directory. To be specific, what is the difference between “nasdaqlisted.txt” and “nasdaqtraded.txt.”. Thank you for your help !

Contact Form Powered By : XYZScripts.com